Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System
نویسندگان
چکیده
This paper presents research on building a model of grammatical error correction, for preposition errors in particular, in English text produced by language learners. Unlike most previous work which trains a statistical classifier exclusively on well-formed text written by native speakers, we train a classifier on a large-scale, error-tagged corpus of English essays written by EFL learners, relying on contextual and grammatical features surrounding preposition usage. First, we show that such a model can achieve high performance values: 93.3% precision and 14.8% recall for error detection and 81.7% precision and 13.2% recall for error detection and correction when tested on preposition replacement errors. Second, we show that this model outperforms models trained on well-edited text produced by native speakers of English. We discuss the implications of our approach in the area of language error modeling and the issues stemming from working with a noisy data set whose error annotations are not exhaustive.
منابع مشابه
The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings
English as a Second Language (ESL) learners’ writings contain various grammatical errors. Previous research on automatic error correction for ESL learners’ grammatical errors deals with restricted types of learners’ errors. Some types of errors can be corrected by rules using heuristics, while others are difficult to correct without statistical models using native corpora and/or learner corpora...
متن کاملRussian Error-Annotated Learner English Corpus: a Tool for Computer-Assisted Language Learning
The paper describes the learner corpus composed of English essays written by native Russian speakers. REALEC (Russian Error-Annotated Learner English Corpus) is an error-annotated, available online corpus, now containing more than 200 thousand word tokens in almost 800 essays. It is one of the first Russian ESL corpora, dynamically developing and striving to improve both in size and in features...
متن کاملSpoken English Learner Corpora
In this paper we present a survey of some most significant spoken English learner corpora created up to date. Spoken learner corpora which include speech generated by learners are important in many areas of research and practice, in particular, for identifying typical pronunciation errors of learners of English as a second language (ESL), English as a foreign language (EFL), or English as a lin...
متن کاملHOO 2012 Error Recognition and Correction Shared Task: Cambridge University Submission Report
Previous work on automated error recognition and correction of texts written by learners of English as a Second Language has demonstrated experimentally that training classifiers on error-annotated ESL text generally outperforms training on native text alone and that adaptation of error correction models to the native language (L1) of the writer improves performance. Nevertheless, most extant m...
متن کاملEAGLE: an Error-Annotated Corpus of Beginning Learner German
This paper describes the Error-Annotated German Learner Corpus (EAGLE), a corpus of beginning learner German with grammatical error annotation. The corpus contains online workbook and and hand-written essay data from learners in introductory German courses at The Ohio State University. We introduce an error typology developed for beginning learners of German that focuses on linguistic propertie...
متن کامل